AITopics | parameter shift

Collaborating Authors

parameter shift

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation

Elhady, Ahmed, Agirre, Eneko, Artetxe, Mikel

arXiv.org Artificial IntelligenceSep-22-2025

Continued pretraining (CPT) is a popular approach to adapt existing large language models (LLMs) to new languages. When doing so, it is common practice to include a portion of English data in the mixture, but its role has not been carefully studied to date. In this work, we show that including English does not impact validation perplexity, yet it is critical for the emergence of downstream capabilities in the target language. We introduce a language-agnostic benchmark for in-context learning (ICL), which reveals catastrophic forgetting early on CPT when English is not included. This in turn damages the ability of the model to generalize to downstream prompts in the target language as measured by perplexity, even if it does not manifest in terms of accuracy until later in training, and can be tied to a big shift in the model parameters. Based on these insights, we introduce curriculum learning and exponential moving average (EMA) of weights as effective alternatives to mitigate the need for English. All in all, our work sheds light into the dynamics by which emergent abilities arise when doing CPT for language adaptation, and can serve as a foundation to design more effective methods in the future.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.00288

Country: Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Transfer Learning for High-dimensional Quantile Regression with Distribution Shift

Bai, Ruiqi, Zhang, Yijiao, Yang, Hanbo, Zhu, Zhongyi

arXiv.org Machine LearningNov-29-2024

Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional regression setting, existing transfer approaches mainly focus on the parameter shift. In this paper, we focus on the high-dimensional quantile regression with knowledge transfer under three types of distribution shift: parameter shift, covariate shift, and residual shift. We propose a novel transferable set and a new transfer framework to address the above three discrepancies. Non-asymptotic estimation error bounds and source detection consistency are established to validate the availability and superiority of our method in the presence of distribution shift. Additionally, an orthogonal debiased approach is proposed for statistical inference with knowledge transfer, leading to sharper asymptotic results. Extensive simulation results as well as real data applications further demonstrate the effectiveness of our proposed procedure.

artificial intelligence, machine learning, source study, (17 more...)

arXiv.org Machine Learning

2411.19933

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Massive Editing for Large Language Models via Meta Learning

Tan, Chenmien, Zhang, Ge, Fu, Jie

arXiv.org Artificial IntelligenceJan-24-2024

While large language models (LLMs) have enabled learning knowledge from the pre-training corpora, the acquired knowledge may be fundamentally incorrect or outdated over time, which necessitates rectifying the knowledge of the language model (LM) after the training. A promising approach involves employing a hyper-network to generate parameter shift, whereas existing hyper-networks suffer from inferior scalability in synchronous editing operation amount. To mitigate the problem, we propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem, subsequently updating the LM parameters using the normal equation. To accommodate editing multiple facts simultaneously with limited memory budgets, we separate the computation on the hyper-network and LM, enabling arbitrary batch size on both neural networks. Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks, i.e., closed book fact-checking and question answering. Remarkably, MALMEN is capable of editing hundreds of times more facts than strong baselines with the identical hyper-network architecture and outperforms editor specifically designed for GPT. Our code is available at https://github.com/ChenmienTan/malmen.

parameter shift, proceedings, tuple, (16 more...)

arXiv.org Artificial Intelligence

2311.04661

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > France (0.04)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Genre: Research Report (0.85)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection

Zhu, Jiaqi, Cai, Shaofeng, Deng, Fang, Ooi, Beng Chin, Zhang, Wenqiao

arXiv.org Artificial IntelligenceDec-28-2023

Real-time analytics and decision-making require online anomaly detection (OAD) to handle drifts in data streams efficiently and effectively. Unfortunately, existing approaches are often constrained by their limited detection capacity and slow adaptation to evolving data streams, inhibiting their efficacy and efficiency in handling concept drift, which is a major challenge in evolving data streams. In this paper, we introduce METER, a novel dynamic concept adaptation framework that introduces a new paradigm for OAD. METER addresses concept drift by first training a base detection model on historical data to capture recurring central concepts, and then learning to dynamically adapt to new concepts in data streams upon detecting concept drift. Particularly, METER employs a novel dynamic concept adaptation technique that leverages a hypernetwork to dynamically generate the parameter shift of the base detection model, providing a more effective and efficient solution than conventional retraining or fine-tuning approaches. Further, METER incorporates a lightweight drift detection controller, underpinned by evidential deep learning, to support robust and interpretable concept drift detection. We conduct an extensive experimental evaluation, and the results show that METER significantly outperforms existing OAD approaches in various application scenarios.

anomaly detection, data stream, detection, (15 more...)

arXiv.org Artificial Intelligence

2312.16831

Country:

Europe > Italy > Sicily (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology (0.93)
Banking & Finance (0.92)
Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning

Wang, Hanrui, Li, Zirui, Gu, Jiaqi, Ding, Yongshan, Pan, David Z., Han, Song

arXiv.org Artificial IntelligenceApr-22-2022

Parameterized Quantum Circuits (PQC) are drawing increasing research interest thanks to its potential to achieve quantum advantages on near-term Noisy Intermediate Scale Quantum (NISQ) hardware. In order to achieve scalable PQC learning, the training process needs to be offloaded to real quantum machines instead of using exponential-cost classical simulators. One common approach to obtain PQC gradients is parameter shift whose cost scales linearly with the number of qubits. We present QOC, the first experimental demonstration of practical on-chip PQC training with parameter shift. Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low fidelity and thus degrading the training accuracy. To this end, we further propose probabilistic gradient pruning to firstly identify gradients with potentially large errors and then remove them. Specifically, small gradients have larger relative errors than large ones, thus having a higher probability to be pruned. We perform extensive experiments with the Quantum Neural Network (QNN) benchmarks on 5 classification tasks using 5 real quantum machines. The results demonstrate that our on-chip training achieves over 90% and 60% accuracy for 2-class and 4-class image classification tasks. The probabilistic gradient pruning brings up to 7% PQC accuracy improvements over no pruning. Overall, we successfully obtain similar on-chip training accuracy compared with noise-free simulation but have much better training scalability. The QOC code is available in the TorchQuantum library.

accuracy, gradient, pruning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3489517.3530495

2202.13239

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.05)
(5 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback